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Abstract 

We present an adaptive method for the automatic scaling of Random-Walk Metropolis- 
Hastings algorithms, which quickly and robustly identifies the scaling factor that yields 
a specified overall sampler acceptance probability. Our method relies on the use of 
the Robbins-Monro search process, whose performance is determined by an unknown 
steplength constant. We give a very simple estimator of this constant for proposal distri- 
butions that are univariate or multivariate normal, together with a sampling algorithm 
for automating the method. The effectiveness of the algorithm is demonstrated with both 
simulated and real data examples. This approach could be implemented as a useful com- 
ponent in more complex adaptive Markov chain Monte Carlo algorithms, or as part of 
automated software packages. 

Some keywords: Adaptive Markov chain Monte Carlo; Robbins-Monro; optimal scaling; 
random- walk Metropolis-Hastings. 



1 Introduction 

Markov chain Monte Carlo (MCMC) algorithms are now routinely used in Bayesian sta- 



tistical inference. In particular, the Metropolis-Hastings algorithm (Metropolis et al. 1953 



Hastings 1970| | is highly popular due to its simplicity and general applicability. The 



most commonly implemented variant is the Random- Walk Metropolis-Hastings sampler 
(RWMH). The RWMH sampler uses a proposal distribution (most commonly, the Gaus- 
sian distribution) centered on the current value of the Markov chain, with some specified 
scale parameter > 0. The success and efficiency of RWMH depends heavily on the 
value of the scale parameter, which typically produces a smaller acceptance probability 
for a proposed move when it is large, and a larger acceptance probability when it is small. 



For target distributions of a certain form, Roberts et al. (19971 proved that an op 



timal acceptance rate should be 0.234 based on a multivariate proposal distribution. 
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Roberts and Rosenthal (2001 [ I suggest a value of 0.44 for a univariate proposal distribu- 



tion. Though these theoretical results were developed under fairly restrictive assump- 
tions, they have proven to work well for more general problems. Thus a common practice 
in MCMC is to manually tune acceptance probabilities by varying the scale parameter to 
achieve the above rates. In practice, the manual tuning of scale parameters becomes dif- 
ficult when the number of proposal distributions in the MCMC sampler is large, and can 
be further complicated by correlation between parameters. One should also avoid asking 
users to tune algorithm parameters, when developing software for general (non-expert) 
use. 

In this article we consider the use of a stochastic search algorithm - the Robbins- 
Monro process - to automatically tune the scale parameters. In essence, the resulting 
adaptive sampler will increase a if the previous MCMC proposed move was accepted 
and decrease a if the proposal was rejected. The amount by which a is changed (the 
step size) decreases linearly with the number of iterations in the Markov chain. Thus this 
adaptation procedure satisfies the diminishing adaptation criterion (|Roberts and Rosenthal 20091 
[Rosenthal 2010|) . In addition to the diminishing adaptation condition, any adaptive scheme 
also has to satisfy the containment (or bounded convergence) condition (|Bai et al. 2008|) in 
order to preserve the ergodicity of the Markov chain. This technical condition is satisfied 
for virtually all reasonable adaptive schemes ([Rosenthal 20101) , and is satisfied in our con- 
text since the step size will approach zero. The rate of convergence of the Robbins-Monro 
process depends on a steplength constant that controls the magnitude of the step size. 
Estimation of this constant is the focus of this paper. 

Andrieu and Thoms (2008| ) review a variety of adaptive MCMC methods, including 



several that use variants of the Robbins-Monro process. None of the algorithms they de- 
scribe attempt to estimate the optimal value of the steplength constant. One co-author of 
the review had proposed a method of estimating this constant (|Andrieu and Robert 2001]) , 
but a quantity that is critical to the method could only be estimated by combining infor- 
mation from three separate Markov chains, making the accuracy of the estimate question- 
able. The method is not mentioned in Andrieu and Thoms (2008] ). See e.g. [Roberts and Rosenthal (2009 1, 
Haario et al. (2005] ), Craiu et al. (2009| ), for other recent work on adaptive MCMC meth- 



ods. 

In Section |2l we introduce the Robbins-Monro process, and its use within the con- 
text of the MCMC framework. Section |3] details results on estimation of the optimal 
steplength constant for the Robbins-Monro process, while Section H] provides the rec- 
ommended adaptive optimal scaling algorithm. Simulated and real data analyses are 
performed in Section|5j We conclude with some discussion in Section[6j 



2 The Robbins-lVIonro process 

The standard situation for which the Robbins-Monro process was devised is the follow- 
ing. A binary response has probability of success p(cr), where cr is a parameter that can be 
controlled. It is assumed that p{a) is a monotonic function of a and here it is appropriate 
to suppose that the function is monotonically decreasing. This assumption usually holds 
in Random- Walk Metropolis-Hastings (and we make this assumption here), as a smaller 
scale parameter a generally corresponds to a larger acceptance rate p{(7), and vice versa. 
The aim is to find the value of a that gives a specified probability of success, say p* . Let 
a* denote this value, so that p{a*) = p* . The Robbins-Monro process conducts a stochas- 
tic search in which a sequence of trials is implemented. At each trial a is set equal to 
the current estimate of a* . If the result of the trial is a success then the estimate of a* is 
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increased and, if it is a failure, the estimate is reduced. Let cjj denote the estimate of a* 
at the i-th trial, i = 1,2, . . .. The Robbins-Monro process ([Robbins and Monro 195l"|l steps 
from ai to fjj+i according to the rule 

_ J ai + c{l — p*)/i if the z*^ trial is a success 
'"'"^ I (Tj — cp*/« if the z*^ trial is a failure, 

where c > is a chosen constant. Each value of ai is expected to be closer to a* than 
the preceding one (|Garthwaite and Buckland 1992|) . The size of steps is controlled by c, 
commonly referred to as the steplength constant. 

The optimum choice of the steplength constant is c* = —l/[dp{a)/da]a=cr*, where 
the derivative is evaluated at the target value a = a* |Hodges and Lehmann 1955). The 
method has good asymptotic properties ( [Hodges and Lehmann 1955[ISchwabe and Walk 19961 
IWetherill 1963|) . In particular, as i ^ oo, ai — a* is asymptotically normally distributed 
with a mean of zero and a variance of p*(l — p*)c^c* /i{2c — c*), provided that c > c*/2. 
If c = c* is set equal to its optimal value, then the asymptotic variance of (j, equals the 
Cramer-Rao lower bound to the variance of any non-parametric unbiased estimator of a* 
(|Wetherill 1975|l . Moreover, as noted by |Wetherill (1963] ), the asymptotic variance is rela- 
tively insensitive to the precise value chosen for c, especially if c overestimates c* so that 
steps are larger than their optimal size: the variance is one-third greater than its lower 
bound when c = 2c* or c = 2c* /3. In general, the optimal value c* is not known and must 
be estimated. 

In the context of the Metropolis-Hastings algorithm, suppose the posterior target dis- 
tribution is /(x) oc /*(x), where /*( . ) is known. Let g{- \ x, a) be the proposal distribu- 
tion when currently at x, where cr is a scale parameter to be identified using the Robbins- 
Monro process. Define p(x, a) to be the probability of accepting a proposed move from 
X, generated from g{- \ x, a). We assume that for any x, p(x, a) is a monotonic decreas- 
ing function of a. That is, we assume that the acceptance probability does not increase 
as the variance of the proposal distribution increases. Under the above scenario, a trial 
consists of generating a value from the proposal distribution. Accepting the proposed 
value equates to a success; not accepting it to a failure. When at x, the probability of suc- 
cess in the Robbins-Monro process, p{a), is given by j>(x, a), the acceptance probability 
in the Metropolis-Hastings algorithm. However, as x varies over the target distribution, 
and because p(x, a) varies with x as well as a, we define p{a) as the overall acceptance 
probability (OAF) of the sampler, p{a) = f p(x, a) /(x) dx. The aim is then to find a value 
a = a*, for which the OAF has some specified value, p{a) = p*. 

Specifically, we have that 

/■ . f /#(y)g(x|y,a) \ 
p X, a) = / mm <^ „ r, l>g{y\ x, cr dy. 

J l/*(x)5(y |x,cj) J 

In the case of a symmetric proposal, g{y | x, a) = g(x | y, a) for all x, y, a, and 

pia) = 1 1 minjl^M, l| ^(y I x, a) /(x) dy dx. (1) 

Hence, under standard regularity conditions 

dp{a) f f . f f*{y) ^\ f dg{y\^,a) \ 

In this article, we restrict attention to the case where g{y \ x, a) is a univariate or multi- 
variate normal distribution with mean x and variance A, and so g{y | x, cr) = g{-K | y, cr). 
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The quantity dp{a)/da, evaluated at a*, determines the optimal value of the steplength 
constant. For the moment we assume that the covariance matrix A is known. However, 
we relax this in Section l4!2l where we simultaneously estimate A along with a* within a 
standard adaptive random walk Metropolis algorithm (|Haario et al. 200T|) . 

However, even in the usual Robbins-Monro context it is difficult to estimate the steplength 
constant from variation in p{a) (|Wetherill 19631 Ruppert 1991 1. In the present context, at- 
tempting to estimate c* from variation in p(x, a) is orders of magnitude harder, as p(x, a) 
is as sensitive to change in x as to change in o". In the following section we develop a 
procedure that avoids this problem. 



3 Estimation of the steplength constant 



Garthwaite and Buckland (1992[ ) and |Garthwaite (1996 1 provide an algorithm for finding 



confidence limits in Monte Carlo tests, in which the steplength constant is not estimated 
through variation in p{<j). Instead, the estimate of c* is based on the distance between 
the current estimate of one endpoint of the confidence interval and the point estimate 
of the quantity for which the interval is required. The ratio of this distance to the opti- 
mal steplength constant, c*, is reasonably similar across a broad range of distributions - 
sufficiently similar to provide an adequate estimate of the steplength constant. For the 
Metropolis-Hastings algorithm, results in this section suggest that, in a similar fashion, 
the relationship between c* / a* and p* may be sufficiently similar across distributions for 
c* to be adequately estimated from a* and p* . Propositions 1-3 below motivate the esti- 
mator of c* that we propose for univariate target distributions. Propositions 2 and 3 are 
proved in an appendix. 

Proposition 1 Suppose that (/(y|x, a) is an m-dimensional multivariate Gaussian proposal dis- 
tribution, y ~ MyAf(x, cr^A), where A does not depend on a. Then a lower hound on c* /a* is 

{mp*)~^. 

Proof: Differentiation of g{y\ x, a) = (27rcr2)^™/2| A|~i/2 gxp{-i(y - x)'A-i(y - x)/a'^} 
gives dg{y\y:, a) /da = {cr~^ (y — x)'A~^(y — x) — ma~^}g{-y\:x.,a). Substituting this 
expression in Equation ^ gives 

dp{a)/da = —mp{a)/a + (p, (3) 

where 



a-^l I minjl^M, l| (y - xyA-i(y - x) ^(y | x, a) /(x) dy dx. 



(4) 



Since (j) > is positive, dp{a)/da > —mp{a)/a. It follows that c*/a* > (mp*) ^, as 
c* = -l/[dp{a)/da]a-=cT* andp{cr*) =p*. □ 

Proposition 2 Let m = I and suppose the conditions of Proposition^\hold and that /(•) has 
finite variance. Then a* /a* — > as a* — > oo, where ((T*)^A is the variance of the proposal 

distribution. 



Proof: In appendix. 

Proposition 3 Suppose that the proposal distribution is the multivariate Gaussian distribution 
defined in Proposition^and the target distribution /(x) is continuous over the support ofx. If 
a ^ Oas p{a) 1, then c* /a* ?a 1/(1 - p*) as p* 1. 
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Figure 1: Plots of c* /a* against p* for nine univariate distributions: (dotted lines) stan- 
dard normal, t (with 5 d.f.), uniform, logistic and double exponential distributions, a 
Gamma(5, 1) and Beta(3, 7) distribution, a normal mixture ^A^(0, 1) + ^Af(5,5), and 
(dashed line) a standard Cauchy. Solid line denotes the relationship c* /a* = — 

p*)]. 



Proof: In appendix. 

We are required to adopt some relationship between c* / a* and p* that will be taken 
as representative of the relationship for target distributions in general. Under mild regu- 
larity conditions on the target distribution, p* — )^ only as a* — > oo. Hence, when m = 1, 
Proposition |2] implies that the relationship should satisfy c* /a* — > \/p* as p* 0. Also, 
from Propositions [T] and m c* /a* should exceed l/{mp*) for allp* and c* /a* 1/(1 —p*) 
as p* —7- 1. Thus, when the target distribution is a univariate distribution, so that m = 1, 
a natural relationship to consider is 

c7^* « ,V (5) 

p*(\ — p*) 

as this is the simplest function that meets these conditions. 

We examined the relationship between c* /a* and p* for a broad range of univariate 
target distributions, based on a univariate Gaussian random walk proposal with vari- 
ance a^. Specifically these distributions were the standard normal, t (with 5 d.f.), Cauchy, 
uniform, logistic and double exponential (/(x) = O.Se"'^', — oo < x < oo) distribu- 
tions, a Gamma(5, 1) and Beta(3, 7) distribution, and a mixture of two normal distri- 
butions: ^A^(0, 1) + ^A^(5, 5). For each target distribution, and for a range of values of 
p* € [0.05,0.95], Monte Carlo methods were used to determine c*/a* by first solving 
Equation ^ for a and then evaluating dp /da at that value of a, using Equation Large 
sample sizes were used to ensure that Monte Carlo variability was negligible. 

Figure[T]plots c* /a* against p* (dotted lines) for each of these distributions. The close- 
ness of the nine lines indicates that the relationship is broadly similar across these dis- 
tributions, although the values of c*/a* for the Cauchy distribution (dashed line) are 
slightly larger than the others for low p* . The solid line in Figure [T] is the relationship 
between c* /a* and p* given by equation The figure indicates that (O is a good choice 
for this relationship, although many other choices would also be satisfactory and could 
be made without greatly affecting the performance of the method. A useful feature of ^ 
is that it generally yields an estimate of c* that is a little too large rather than too small: it 
is better to overestimate the steplength constant than to underestimate it. 



5 



Returning to multivariate distributions, Roberts et al. (199^ considered m-dimensional 
target distributions of the form 



/{■k) = h{xi)h{x2) . . .h{Xrn) (6) 

for some one-dimensional smooth density h, where x = (xi, . . . , Xm)'- They showed that 
if the proposal distribution is an m-dimensional Gaussian distribution, y ~ MVN{-k, a^Im), 
thenp(cr) = 2<^{-aBm^/'^/2) as m — )■ oo, where i3 > is a constant that depends on h. 
Roberts and Rosenthal (2001 [ I derive similar results for the case where the target distribu- 
tion is a multivariate Gaussian and the proposal distribution is an m-dimensional multi- 
variate normal distribution, y ~ MFA^(x, cj^A), or if /(x) = YYh=i Cih{CiXi), where the 
{Ci\ are i.i.d draws from some fixed distribution. They show that as m — > oo, the value 
oip{a) tends to 2^{—j3a) for some positive constant /3. The following proposition gives 
c*/(7* whenever p((t) has this form. 

Proposition 4 Suppose that p{a) = 2^{—j5o), where (3 > Oisa constant and $ is the cdfofthe 
standard normal distribution. Then 

c*/a* = (27r)^/2 exp(aV2)/(2a), (7) 

where a = —^~^{p*/2). 

Proof: Differentiating p(cr) = 2 J~^{2TTy^/'^ exp{-z^ /2)dz gives dp{a)/ da = -2p{2-K)'^l^ 
exp{-(/3o-)^/2). Write a = fia* , so that a = -$~^(p*/2). The proposition follows as 

c*/a* = -l/{a*[dp{a)/d<j]„=„,}. □ 

To gain an impression of how c* /a* varies with the dimension, m, for multivari- 
ate target distributions, we consider both target distributions of the form ((6]l, and m- 
dimensional multivariate-t distributions on v degrees of freedom. For fixed p* , and 
with a multivariate Gaussian random-walk proposal distribution with covariance matrix 
(T^Im, experimentation indicated that c* /a* was close to a linear function of 1/m* . Here, 
m* = m for target distributions of the form (O, and m* = min{m, v] for the multivariate-t 
target distributions. If we assume that Proposition|4]holds as m* oo, and that Equation 
(|5]) also holds when m* = 1, then this determines the linear function as 

(2vr)V^e"V^ + 1 (8) 

2a m*p*{l — p*) ' 

where a = -^^^{p* /2). 

To examine the usefulness of ^ for target distributions of the form /(x) = YYi^i ^{xi), 
each of the univariate distributions considered in Figured] was taken in turn as the com- 
ponent distribution /i(-). As before, Monte Carlo methods were used to determine c*/cj* 
for a range of values of p* G [0.05, 0.95], and the resulting relationships are shown in 
Figure |2] for m = 2, 4, 8 and 20 dimensions. In each panel (a)-(d) the highest (dashed) 
line corresponds to the Cauchy distribution and the dotted lines to the other target dis- 
tributions. For each model dimension, the relationship between c* /a* and p* is similar 
for all distributions. The solid line illustrates the linear function (|8]l, which exhibits a 
strong similarity with the other curves, implying it models the relationship across model 
dimensions well for these models. As with the univariate case (Figured]), the form of (|8]) 
generally represents a practically convenient overestimate of c*. 
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Figure 2: Plots of c* /a* against p* for m-dimensional multivariate distributions of the 
form /(x) = ni^i f^i^i) where h{-) is given by the nine univariate distributions in Figure 
[T] (dotted lines). Dashed line denotes the standard Cauchy distribution. Solid line denotes 
the relationship given by (HJ. Panels (a)-(d) represent ?n = 2, 4, 8 and 20 dimensions. 



To examine the usefulness of ^ for multivariate t target distributions, similar calcu- 
lations were performed for various combinations of model dimension, m, and degrees of 
freedom, v. The results are presented in Figure |3] for m* = 1 (panel a) and (b) m* = 4 
(panel b), where in each case the solid line illustrates the relationship given by (H). In 
Figure |3la) the top curve is from the Cauchy distribution {m = l,iy = 1) and the bottom 
curve is actually two near-identical curves given hy {m = A,v = 1) and {m = l,u = 4). 
In Figure [S^b) all curves are visually indistinguishable. Hence, using m* = min{m, v} 
appears sensible for t distributions, and, as such, provides strong support for (|8]) as a 
representation of the relationship between c* /a* and p*. 

In order to use Equation ([8]), the value of m* must be specified. However, considera- 
tion of multivariate t-distributions with few degrees of freedom indicates that common 
choices of p* {p* = 0.44 for a univariate target and p* = 0.234 for a multivariate target 
of moderate or large dimension) are not always appropriate. For example, with a multi- 
variate Cauchy target distribution, /*(y)//*(x) is the ratio of two multivariate normal 
distributions multiplied by the ratio of two imivariate normal distributions. Choosing p* 
on the basis of the dimension of the multivariate distributions is a poor approach as this 
ignores the latter ratio. The considerations necessary for choosing p* should also give a 
suitable choice of in*. 

If dc is the estimate of a* after i steps of a Robbins-Monro search with a steplength 
constant of c, then var(3'c 

) = p*{i - p*)c^ /{i{2c/c* - 1)} ( [Hodges and Lehmann 1955l l. 



This is minimized when c = c* so the efficiency of a search is defined to be 

X 100% = i^^ZiiV X 100%. (9) 

var(cjc) 

Efficiency declines slowly if c* is overestimated: efficiency is 75% when c = 2c* and c has 
to exceed more than 3.4c* before efficency drops below 50%. As such, if a suitable choice 
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Figure 3: Plots oi c* /a* against p* for univariate and multivariate t distributions with v 
degrees of freedom. Panel (a) shows (m = 1, ;/ = 1), (m = 4, i/ = 1) and {m = l,u = 4), 
so that m* = 1. Panel (b) shows (m = 4, i/ = 4), (m = 4, = 8) and {m = 8,u = 4), so 
that m* = 4. Solid line denotes the relationship given by (H)). 



for m* is unclear, setting m* to a small value (such as m* = 1) should not be a disaster 
as c* will be overestimated. When the choice of m* is unclear, the length of the Markov 
chain burn-in phase might be doubled, which would more than compensate for a drop 
in efficiency of up to 50%. 

In summary. Figures [U-El show that c* / a* is largely determined by the values of p* 
and m* for a range of distributions, and Propositions 1-4 suggest that this should hold 
more generally. Equation ^ gives a good estimate of c* /a* and it will be used in our 
implementation of the Robbins-Monro process within an MCMC sampler. That is, when 
a is the estimate of a* that the search currently gives, we will put 

I \ m* J 2a ■m*p*{l—p*) 

as the steplength constant for the next step of the search. 

When the steplength constant is given by dlOl ), calculation shows that the efficiency 
of the search for each of the distributions considered in Figure [T] is at least 96% when 
p* = 0.234 and at least 91% when p* = 0.440. In the context of forming confidence in- 
tervals, Garthwaite and Buckland (1992) found the efficiency of the Robbins-Monro was 
generally noticeably lower (but adequate), typically around 70-80% for 95% confidence 
intervals. The improved efficiency here is because the proposal distribution is fixed as 
a multivariate normal distribution and this largely determines the optimal value of the 
steplength constant. 




4 Search algorithms for adaptive optimal scaling 

In this section we describe the implementation of the Robbins-Monro search process 
within MCMC algorithms. We initially consider univariate and multivariate target dis- 
tributions where all model parameters are updated simultaneously using Metropolis- 
Hastings updates. We then consider a multivariate target within the Metropolis-Hastings 
within Gibbs sampler framework. The performances of the algorithms are examined via 
examples in Section 5. 
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4.1 Univariate target distributions 

For the univariate target distribution f{x) oc /*(rc), under a Gaussian RWMH sam- 
pler with proposal distribution y ~ N{x,a'^), we aim to use the Robbins-Monro pro- 
cess to determine the value of o" = cr* so that the OAP of the sampler is p*. Following 
Roberts and Rosenthal (200T] | we suppose that p* = 0.44 is appropriate. 

The basic strategy is to continually improve the estimate of a* at each step of the 
Markov chain. If denotes the estimate of a* after the i-th step of the search, then we set 

_ f ai + c{l-p*)/i if y is accepted 
'"''^ \ o-j — cp* /i if y is rejected 

where c = ai/{p*{l — p*)} (c.f. equation (O). Starting values for a search can be ar- 
bitrary (e.g. o"! = 1) or more considered, such as an estimated standard deviation of 
/(•). Either way, ai need not be well-chosen, as the Robbins-Monro process can be mon- 
itored and a search restarted if the starting value seems poor (e.g. IGarthwaite 19961 
IMatsui and Ohashi 1999|l . Otherwise — > a* can take a long time to converge as the 
step size decreases with i. 

On a restart, the most recent estimate of a* is taken as the starting value and the value 
of i is reset. Note that we start (and restart) a search with i = no, where no is a moderate 
size so as to avoid too rapid steplength changes in the early stages of a search (e.g. steps 
would halve in size between i = 1 and i = 2). We choose no to be the integer closest to 
5/{p*(l — p*)}, which works well in practice. We also choose to restart the search if the 
estimate of a* changes by a factor of 3 from its value from when the search started (or last 
restarted). Many other criteria for restarts would also work well, as the only requirement 
is to restart if a poor starting value has been used. 

It seems conceivable that a search might oscillate between a tripling in value and 
reducing in value by two-thirds, so that the search is continually restarting (though this 
has never happened in our experience). To ensure that this cannot happen, our algorithm 
records the number of restarts resulting from a tripling and the number resulting from 
it reducing by two-thirds. Should both these numbers reach 5, then the process is not 
restarted again. We also do not restart if more than 100 steps have been taken since the 
last restart, as taking 100 steps without restarting suggests a reasonable starting point has 
been used. These decision rules are obviously arbitrary to a degree, but they work well 
in practice in that they seldom need to be enforced. At the same time, the rules mean 
that the size of steps definitely shrink to zero, so the procedure satisfies the diminishing 
adaptation criterion. 



4.2 Multivariate Metropolis-Hastings updates 

We now consider a multivariate target distribution, /(x) oc /*(x), where all components 
of X are updated simultaneously using a Gaussian random walk proposal MVN{-k, a^A), 
for some positive-definite matrix A, where the purpose of the Robbins-Monro search 
is to find a value of a that gives an OAP of p*. Following Roberts et al. (1997) we set 



p* = 0.234. The search procedure is fundamentally the same as for the univariate target 
distribution. A starting value for a is again required and an arbitrary value may be cho- 
sen, the value for no is determined in the same way as before, and the same method 
of monitoring and restarting searches is followed. Other details differ slightly. The 
steplength constant is determined by Equation |(T0)) . so 

(12) 
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where a = — <l>^^(p*/2). Usually, the value of m* should equal m, the dimension of x. 
However, if j components of x are related by some variate (e.g. a variance in the case of 
t-distributions) that has v degrees of freedom, and v < j, then we suggest reducing m* 
hy j — V. This is in line with the results illustrated in Figure |3l If there is doubt as to the 
appropriate value of m* , the analysis in Section |3] advocates setting m* too small rather 
than too large. 

Turning to the choice of A, in many circumstances the convergence rate of the Markov 
chain can only be optimized if A is proportional to the covariance matrix of x. This 
covariance matrix, 5] say, is typically unknown, but its value may be estimated as the 
Markov chain runs JCraiu et al. 2009]) . After each iteration of the chain, A may be set 
equal to the current estimate of S. Thus we can set 

_ r In., . Z < 100 

' I ih E;=i(x, - x.)(x, - X.)' ^ > 100. ^'"^ 
where the computation for updating Sj is reduced by using the recursions 

Xj = - [{i - l)xi_i +Xj] 



and 



2- - -/ « _ _/ 1 , 

"^j-l +Xj_iXj_]^ ; rXjXj + - -XjXj. 



^ — 1 I — \ I — 1 

Haario et al. (2001) suggest Xl^ + el„j (where e > 0) as a positive-definite estimate of I]. In 
our implementation we follow this approach with e = crf/i, so that Aj = + aflm/i as 
the estimate of S. Thus, MVN{-k, afAi) is used as the proposal distribution after i steps 
of the Robbins-Monro search. 

When the dimension of x is large, a substantial number of iterations of the Markov 



chain may be needed before the estimate of S stabilizes. Roberts and Rosenthal (2009 1 
give examples where 400,000 iterations were needed when m = 100, and nearly 2 million 
were needed when m = 200. For the Robbins-Monro process to converge to the correct 
value of a*, it must not converge before the estimate of S stabilizes. Similarly, many 
sampler iterations are typically needed to effectively explore the parameter space when 
there are many parameters, and proposal acceptance probabilities may vary dramatically 
over this space. Hence, even after the estimate of S is fairly stable, the estimate of a 
should converge slowly if it is to reflect the overall optimum for the parameter space. 

To achieve this, we propose that the magnitude of steps should not be reduced below 
some pre-fixed limit imtil the estimate of I] is reasonably stable, and after that the stepsize 
should be reduced slowly. In our algorithm, for i > 200 we set 

_ J CTj + c(l — p*)/max{200, i/m} if y is accepted , , 

\ — cp*/max{200, i/m} if y is rejected. 

which works well in practice. There are many alternative choices that would also be 
satisfactory Monitoring of the traceplots for the parameters should be carried out to 
ensure that estimates of S have stabilized. 



4.3 Metropolis-Hastings within Gibbs 

In Metropolis-Hastings within Gibbs, x is partitioned into components, x' = (x'^^, . . . , x^ ), 
and each component is sampled sequentially (conditional on other components), us- 
ing Gibbs sampler component updates where possible, but otherwise using Metropolis- 
Hastings updates. In the latter case, we suppose the proposal distribution is of the form 
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N{xk, (T^) for scalar components, or MVN{-Xk, cr^Afc) for vectors, k = 1, . . . , j. The ap- 
propriate value of (7^ varies from component to component, so if there are q < j com- 
ponents that require a Metropolis-Hastings sampler, then q Robbins-Monro searches are 
conducted while the sampler runs, each one searching for a* for one particular compo- 
nent. The searches are conducted using the methods given in Sections 4.1 and 4.2. 

5 Examples 

5.1 Univariate target distributions 

The search algorithm given in Section WA\ was applied in turn to each of the nine univari- 
ate target distributions considered earlier (c.f. Figure [l]). Two hundred samplers were 
run for each target distribution, and a search for the value of a* that gave an OAF of 0.44 
was conducted within each chain. The starting point for each search was obtained by 
randomly setting ai ~ Exp(l). As a reasonable aim of the search process is that it should 
give a good estimate of a* within 2000 steps, each chain was run for 2000 iterations. The 
final estimate of a* and the acceptance rate of the sampler over the last 1000 iterations 
was recorded for each chain. The results are summarized in Tabled] 

Table 1: Sampler performance based on 200 chain replicates of length 2000 iterations, 
under a target acceptance probability of p* = 0.44, for each of the nine univariate target 
distributions in Figured] Optimal (true) values of a*, and 0.05, 0.5 and 0.95 quantiles of 
the empirical distribution of the estimates of a* are given, together with quantiles of the 
sampler acceptance rates in the last 1000 iterations. 



Target 
distribution 


Optimal 

a* 




a* quantile 




OAF quantile 


0.05 


median 


0.95 


0.05 


median 


0.95 


iV(0,l) 


2.42 


2.31 


2.43 


2.56 


0.417 


0.443 


0.468 


t-dist. (5 d.f.) 


2.71 


2.54 


2.73 


2.89 


0.413 


0.441 


0.470 


Cauchy 


4.39 


3.69 


4.25 


5.03 


0.389 


0.443 


0.501 


Logistic 


4.05 


3.82 


4.05 


4.33 


0.417 


0.442 


0.467 


Double exponential 


2.70 


2.52 


2.70 


2.93 


0.413 


0.439 


0.465 


Gamma(5,l) 


4.98 


4.62 


4.96 


5.28 


0.414 


0.443 


0.467 


Beta(3,7) 


0.335 


0.311 


0.335 


0.355 


0.417 


0.440 


0.466 


Uniform 


0.806 


0.764 


0.807 


0.849 


0.418 


0.442 


0.464 


iiV(0,l) + iiV(5,5) 


6.07 


5.59 


6.10 


6.50 


0.415 


0.442 


0.468 



For each of the nine univariate distributions, the second column in Table d] provides 
the theoretical value of a* and the next three columns give the 0.05, 0.50 and 0.95 quantiles 
of the final estimate of a* from each search. The last three columns present the same 
quantiles of the acceptance rates in the last 1000 steps of each search. The results indicate 
that the Robbins-Monro search has low bias and good accuracy: the median estimate of 
a* is close to a* for each of the nine distributions, the median values of the OAF are close 
to their target of 0.44, and the 0.05 and 0.95 quantiles for both a* and the OAF are quite 
close together. This performance clearly exceeds requirements, as the efficiency of the 
Metropolis Hastings algorithm is not sensitive to the precise value of p*. 
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5.2 Multivariate Metropolis-Hastings updates 



We follow the example of [Roberts and Rosenthal (2009 1 in which the target distribution 



is /(x) = MVN{0,'S), where T, = MM' and M is an m x m matrix whose elements 
are generated randomly from a A^(0, 1) distribution. This target distribution is somewhat 
pathological, in that typically XI will be close to singular, and so we refer to this as the ill- 
conditioned case. We also consider a modified target distribution, in which each diagonal 
element of T, is increased by 1%, and refer to this as the better-conditioned case. It is perhaps 
more representative of target distributions that arise in practice. Sampling from these 
target distributions is difficult when the dimension m is moderate to large, especially 
with the ill-conditioned case. Here we set m = 50. 

For each case we consider three versions of the RWMH sampler. The first version (the 
RM method) is the algorithm proposed in Section 4.2, where the proposal distribution has 
the form MVN{xi-i,afAi), where cjj is estimated using the Robbins-Monro procedure 
and Aj = Sj + aflm/i, with Xl^ empirically estimated from Equation ((13)) . In theory, the 
optimal proposal distribution is A/FA^ (xj_i, 2.38^S/m) ((Roberts et al. 1997)) . The second 
sampler version (the Optimal method) implements a RWMH sampler with this (fixed) pro- 
posal distribution. This sampler is unavailable in practice, as the true value of S is typ- 
ically unknown, but its performance provides a benchmark for the other methods. The 
final sampler version {the fixed-scaling method) implements the RWMH sampler with co- 
variance matrix 2.38^ Aj/m, where Aj is estimated as in the RM method. This sampler is 
part way between the RM and the Optimal methods. 

Ten replicate samplers, each of length 100,000 iterations, were run for both the ill- 
conditioned and the better-conditioned case, and for each of the sampler variants. Results 
are based on discarding the first half of a chain as burn-in. For the Robbins-Monro algo- 
rithm, we set the desired overall acceptance probability (OAF) equal to p* = 0.234, no = 
20 and m* = 50. As a measure of algorithm efficiency, we follow Roberts and Rosenthal (2009 [ ) 



in monitoring the integrated auto-correlation time (ACT) (e.g. [Roberts and Rosenthal 2001)1 . 
We also monitor the average squared jumping distance (ASD) between the iterates of the 
chain. A smaller value of the ACT indicates less auto-correlation, and hence greater effi- 
ciency. Similarly, the larger the jumping distance, the faster the mixing of the chain. All 
ACT and ASD values are calculated using full length of the chain (including the burn-in 
period). 

A summary of the results is provided in Table |2l which includes specific results for 
the first coordinate, xi. With the better-conditioned case, results for all three methods 
are highly satisfactory. The RM method consistently estimates cj^ with good accuracy 
and the OAPs for all methods are close to the target value of 0.234. The Optimal method 
benefits from the imrealistic advantage of knowing S, and it has a noticeably better ASD 
than the other methods. However, all three methods give good estimates of the mean and 
standard deviation of xi (whose true values are and 7.48 respectively). Results for the 
ill-conditioned case are more diverse. The estimates of S are poor by construction, which 
hinders the RM method and the fixed-scaling method, both of which must use these 
estimates. The RM method compensates by setting to a low value and this enables the 
method to attain the target acceptance probability of 0.234. In contrast the fixed-scaling 
method achieved an OAF of only 0.14. 

Figure ID shows the ability of the RM method to quickly find the value of cr^ that gives 
the desired OAF for both better- (left panels) and ill-conditioned (right panels) cases. 
The top panels illustrate the estimates of (cr*)^ for one of the chains from Table |2j the 
horizontal lines correspond to the optimal value, a = 2.38/m^/^. The bottom panels dis- 
play the corresponding running acceptance rates over a window of 500 iterations of the 
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Table 2: Summaries of RWMH sampler performance under three sampler variants 
using a multivariate normal proposal distribution, for both better- and ill-conditioned 
cases. Parentheses indicate Monte Carlo standard errors based on 10 sampler replicates. 
Columns correspond to the mean value of cr^; the overall acceptance probability (OAP); 
posterior mean and standard deviation for the component xi, the integrated autocorrela- 
tion times (ACT) for the parameter xi; and the average squared distances (ASD) between 
the iterates of the parameter xi. 



Statistics for xi 

Version of OAP mean sd ACT ASD 

Better-conditioned case 

RM method 0.126(0.01) 0.233 (0.001) 0.14(0.33) 7.09(0.18) 77.98(1.08) 1.12(0.09) 

Optimal S 0.113 ( - ) 0.239 (0.001) 0.09 (0.30) 7.60 (0.17) 75.08 (0.83) 1.48 (0.01) 

Fixed-scaling 0.113 ( - ) 0.258 (0.001) -0.01 (0.44) 7.01 (0.22) 78.29 (1.03) 1.08 (0.09) 



Ill-conditioned case 

RM method 0.070(0.001) 0.233(0.006) 0.12(0.52) 6.63(0.28) 89.13(0.93) 0.47(0.04) 

Optimal S 0.113 ( - ) 0.239 (0.001) -0.02(0.34) 7.41 (0.14) 74.20(0.93) 1.45(0.02) 

Fixed-scaling 0.113 ( - ) 0.141 (0.001) -0.14 (0.48) 6.37 (0.33) 89.05 (0.56) 0.44 (0.03) 



chain. In each chain the estimate of cj^ stabilized within about 3000 iterations; thereafter 
the acceptance rate was close to the target of 0.234. The diagrams illustrate that the RM 
method performed well with these high-dimensional target distributions, even in the ill- 
conditioned case. 



5.3 Metropolis-Hastings within Gibbs: Respiratory Infection in Children 

We apply our adaptive MCMC algorithm to an example involving respiratory infection 
in Indonesian children (Diggle et al. 1995 ILin and Carroll 2001|) . The data contains lon- 



gitudinal measurements on 275 Indonesian children, where the indicator for respiratory 
infection is the binary response. The covariates include age, height, indicators for vita- 
min A deficiency, gender, stunting and visit numbers (one to six). Previous analyses have 
shown the effect of age of the child to be non-linear, and so we use a Bayesian logistic ad- 
ditive mixed model of the form 

logit{P(respiratory infection.^. = 1)} = /Sq + Ui + P^X.j + /(age.^ (15) 

for 1 < i < 275 children and 1 < j < rii repeated measures within a child. The random 
child effect is Ui N{0, afj), Xjj is the measurment vector of the remaining 11 covariates. 



and / is modelled using penalized splines with spline basis coefficients Uk ~ -^(0, cr^ 



We follow Zhao et al. (2006| l and |Fan et al. (2008[ ) and apply hierarchical centering to 



the random effects. All continuous covariates are standardised so that the choice of hy- 
perparameters can be scale independent. Radial cubic basis functions are used to fit the 
covariate age 

/(age) = /3o + /3iage + Z.g^u 

where 

Z,ge = [|age-Kfc|3][|Kfc-Kfc/|3]-i/2 and ur^N{0,all) 

1<K<K l<k,k'<K 
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Iterations 



Iterations 




Figure 4: Trace-plots of (top panels) and the corresponding running acceptance rates 
(bottom panels) over the previous 500 iterations, using the RM method. Left (Right) 
panels display results for the better- (ill-) conditioned cases. 



with Kfc = (;^^)th quantile of the unique predictor values, where K is chosen to be 20. 
We use a vague prior for the fixed effects. For both variance components, we use an 
inverse gamma prior with equal scale and shape parameters. Previous Bayesian analyses 
(e.g., IZhao et al. 2006|l showed this to be a robust choice for scale (and shape) parameter 
values of 0.01 and larger. 

We consider two variants of Metropolis-Hastings within Gibbs samplers. The first 
(a full-conditional approach) systematically cycles through all 306 parameters, using a 
RWMH algorithm with proposal distribution N{x\_y^ erf), for each t = 1, . . . 306, and 
implementing separate Robbins-Monro searches with the aim of finding values of 
that give acceptance probabilities of 0.44. The second scheme (a block-conditional ap- 
proach), block-updates the eleven /3 parameters and the twenty knot components {nk} 
respectively via the multivariate Gaussian proposal distributions A^(x-^^^, cj^^j Ac) and 

' ^fk)-^k), where the superscripts c and k refer to the coefficient component and 
knot component of x. The desired acceptance rate of the associated Robbins Monro 
searches is 0.234 in each case. The remaining 275 parameters are updated using the full- 
conditional approach as before. In both schemes the two variance components are up- 
dated using the Gibbs sampler, and all algorithmic conditions follow those outlined in 
Sections O and 1121 

The sequential Monte Carlo sampler of Fan et al. (2008| l fits the model ((T5|l using pe- 
nalized quasi-likelhood (PQL) (|Breslow and Lin 1995|) to obtain an approximate maxi- 
mum likelihood estimate of the covariance matrix of the 306 parameters. To provide com- 
parison with the Robbins-Monro method, we implement the full conditional approach 
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Full conditional 



Full conditional 



Full conditional Full conditional 



PQL RM PQL 



Figure 5: Boxplots of overall acceptance probabilities (left panels) and cr^ values (right 
panels) for all parameters and blocks, based on the second half of the MCMC sampler 
output. Top and bottom panels indicate full-conditional and block-conditional updates 
respectively. 



but using the diagonal elements of the PQL matrix as (fixed) o"^ values. For the block- 
conditional approach, the appropriate blocks of the PQL matrix were also used instead 
of Ac and A^, where the values of cr^^^ and cr^^^ are set to the optimal values of 0.51 and 
0.28 respectively. For all samplers, chains of length 10,000 and 50,000 were used for the 
full- and block-conditional approaches, respectively. 

Results for all univariate proposal distributions are summarized in Figure [5] Over- 
all acceptance probabilities (left panels) and the final estimates of (a*)^ (right panels) for 
each of the (306 or 275) parameters are illustrated for both the full-conditional (top pan- 
els) and block-conditional approach (bottom panels). Within each panel, the left boxplot 
displays results for the Robbins-Monro method, and the right boxplot the PQL matrix 
approach. 

Both Robbins-Monro samplers led to acceptance probabilities that were very close to 
the target of 0.44; the OAP ranged from 0.425 to 0.501 for the 306 proposals run for the 
full-conditional method and from 0.442 to 0.472 for the 275 univariate proposals of the 
block-conditional method. To achieve these rates the RM search varied erf substantially; 
the highest final mean value (based on the second half of the chain) of a1 was 38.806 
and the lowest final mean value was 0.529. In marked contrast, the method based on 
PQL gave OAP values which varied far more than with the RM method. For the full- 
conditional method, the OAP given by PQL varied from 0.598 to 0.991 - frequent and 
substantially different deviations from the ideal target. 

The Robbins-Monro searches were also effective with the multivariate proposals. The 
target OAP was 0.234 and it gave an OAP of 0.235 for the block of eleven fi coefficients 
and 0.230 for the block of 20 knots {Kk}. 

6 Discussion 

This paper exploits the Robbins-Monro process to give a method for automatically tun- 
ing the scaling factor of the Random-Walk Metropolis-Hastings MCMC sampler. The 
method was implemented in a search algorithm and its effectiveness illustrated. When 
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the random walk covariance matrix was estimated jointly with the scaling factor, our 
search algorithm for the scaling factor converged so quickly that we had to deliberately 
slow it down to give time for the covariance matrix to be estimated. This speed of con- 
vergence suggests that the algorithm may make a useful component of more complex 
MCMC methods. For instance, recently proposed MCMC algorithms (|Craiu et al. 20091 
[Roberts and Rosenthal 2009|l offer regional adaptation in order to improve chain mixing 
when a global scale parameter is inadequate. These methods require a different proposal 
distribution to be estimated depending on which region of the parameter space the chain 
is in currently. Hence the speed with which the proposal distribution is estimated in 
each region is likely to be an issue, making use of Robbins-Monro searches an attractive 
approach. 

Our search algorithm satisfies the diminishing adaptation criterion ([Roberts and Rosenthal 2009]l , 
in that changes to the scale parameter vanish as the length of the MCMC chain goes to 
infinity. An interesting possibility would be to relax this requirement by fixing a (small) 
minimum value for the size of scale parameter changes. This could allow continuous 
adaptation of the proposal distribution as the Markov chain moves around the parameter 
space. Existing theory would need to be extended to determine when the chain converges 
to its target distribution, but theoretical results might be obtainable as the Robbins-Monro 
process is itself a Markov chain, so that the Robbins-Monro process and an MCMC chain 
together still form an MCMC chain. 
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Appendix: Proof of Propositions |2] and |3] 

Proof of Proposition El / min(/*(y)//*(x), I) g{y \:x.,a) dy < 1, so 

1 1 min(^|^, 1^ x'xg(y|x,a)/(x)dy(ix< J x'x/(x)dx. (A.l) 

As /(.) has finite variance, both sides of (A.l) are finite. Similarly, as g{y j x, o") = g(x [ y, a), 
we have that / / min(/#(y)//#(x), 1) y'y g{y \ x, a) /(x) dy dx is also finite. It follows 
that (f) in equation (4) is 0{a~^). 

Let S{r) be an m-dimensional sphere of radius r, centred at the mean of /( . ). Let 
S'^{r) = 17 — S{r) denote its complement. Given any e, choose r such that /^c)-^-) /(y) dy < 
e. Thenmp(cj) ?a /^^^^ min(/#(y)//#(x), 1) m5r(y | x, a) /(x) dy dx and 

\immp(a)^[ [ min f ^^J^, 1^ m (27r)-'"/V-"^ lAr^/^ /(x) dy dx, (A.2) 
JnJs{r) V/*(x) J 

since lirrio-^oo ^(y I x, cr) — > (27r)~™/^(T~™ lAI"-"^/^ for y € S{r). As mp{a) is non-zero 
for finite a, mp{a) is 0((T~™). Hence, limo-^ool*?^ — mp{a)/a} = limfj^oo{—'mp{o')/cr} if 
m = 1, since is 0{a~^). Then, from equation (3), lim.a-^oo dp{a) / da = —mp{a)/a and 
the proposition follows from c*/(T* = —l/[adp{a)/da]cj=a*- Q 

Preliminary lemma. Let (/(.j x, a) be a multivariate normal distribution and let x and A 
be fixed m x 1 vectors. Define the region i? by ii = {y : A'(y — x) < 0}. Then 

/ A'(y - x){d<7(y| x, a)/da}dy = a'^ [ A'(y - x)5(y| x, a)dy. (A.3) 

JR J R 
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Proof of lemma. For i = 1, . . . , m and j = 1, . . . ,m, let bij be the (i, j) element of A ^ 
and let Xi, yi and Aj denote the ith elements of x, y and A. Suppose Aj > and let h{i) = 
- ^jiVj - Xj)/>^i- Integrating by parts, /_^J^^{A'(y - x)}{(j"3 ^^..^y. _ ^.^^y. _ 

x,)}.5(y|x,a)% = U{Y-^)}{-a-\y,-x,)}.g{y\^,<y)tl^l_^ + /'S '^"HA'ly - x) + 
\{Vi - Xi)}.g{y\ X, a) dyi = + f^^^ cr~HA'(y - x) + \i{yi - Xi)}.g{y\ x, a) dyi. Thus 

/ {A'(y-x)}{cr~^ V6ij(yi -Xi){yj - Xj)}.g(y\yL,a)dy 

Jr y 

= / a-^{y{y-^) + Xi{yi-Xi)}.g{y\j^,a)dy. (A.4) 
Jr 

Equation (A.4) also holds for Xi < 0, so /R{A'(y - x)}{c7"^ J2i J2j bijivi - Xi){yj - Xj)} 
.g{y\x,a)dy = Jj^{m + l)a-^{X'{y-x)}.g{y\x,a)dy. Since dg{y\x, a) /da = {(J-^Y^i Ej 
hjiVi — Xi){yj — Xj) — ma~^}.g{y\ x, a), the lemma follows. □ 

Proof of Proposition|3l As p{a) — > 1, u — > 0. Then g{y\ x, u) — > except when 1 1 y — x| | — > 
0. Let X be fixed. For small || y — x||, we may put /(y) ~ /(x) + A'(y — x)/(x), where 
A does not depend on y. Then /*(y)//*(x) < 1 if and only if A'(y — x) < 0. Let 
= {y : A'(y - x) > 0} so that is the complement of R. As a ^ 0, 

/ ^) ■ I ^' ^ ^' Jj-^ ^ - ~ x))5r(y| X, a) dy 

= 1 + / A'(y-xMy|x,a)(iy. (A.5) 
Jr 

Differentiating equation (A.5), as a ^> 0, 

f . ( f*{y) S\ dg{y\yL,a) f , dg{y\:si,a) 

= ^-i I A'(y-x)g(y|x, a)dy, (A.6) 
Jr 

from the lemma. From equations (A.5) and (A.6), 

[ . ff*{y) ,\ , I , , 1 ^ /■ • ff*(y) A dg{y\x,a) ^ 

Now, from Equation[TJ 

/ / ( /#(x) ' ^) • 5(y I X, 0-) /(x) dy dx = p(o-) 
and, from Equation|2j 

Hence, as p(o-) 1,p(ct) l + cr{(ip(cr)/(icr}, soc* = -l/[(ip(fT)/d(j]cr=CT* ~ □ 
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